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On March 4, 2009, the D0 and CDF collaborations at Fermilab's Tevatron Collider 
submitted pa pers to Physical Review Letters announcing observation of single top quark 
productionliJI^I This review paper describes the successful searches carried out indepen- 
dently by the two collaborations, allowing the reader to see the similarities and differences 
that led to the simultaneous discoveries. Both collaborations measured a cross section 
cr{pp—^tb + X,tqb + X) consistent with the standard model prediction at 5.0 standard 
deviation significance, and set a lower limit on the quark mixing matrix element |Vff,| 
without assuming matrix unitarity with three quark generations. 
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1. Introduction to the Top Quark 

The top quark, an up-type quark, and the bottom quark, a down-type quark, 
together form the third generation of quarks. Both top (t) and bottom (b) 
quarks have spin 1/2. The top quark has electric charge +2e/3 and a mass of 
173.1 ± 1.3 GeV.l2lThe bottom quark has charge -le/3 and mass 4.20^!^;^^ GeV.Sl 
All other quarks (w, d, c, s) are nearly massless in comparison. The top quark 
has a lifetime'^' of 0.5 x 10^^"' s that is much smaller than the strong interaction 
timescale, and is thereby unique in the quark family, decaying before it can form 
a bound state with another quark.EI Thus, the kinematics of the particles from the 
top quark decay contain information about the bare top quark itself. The Cabibbo- 
Kobayashi-Maskawa (CKM) matrix "F" describes quark mixing.^ When there are 
exactly three quark generations, the matrix is unitary, and a global fit to all available 
precision data constrains the element \Vtb\ to be very close to one.^^ Therefore, in 
the standard model (SM) the top quark decays almost every time to aW boson and 
a b quark. The tiny SM values for \Vtd\ and |Vfs| indicate that decays to Wd and 
Ws are extremely rare.^ If there were a fourth quark generation (i', &'), then decays 
to light quarks could occur more often since |Vtf,| would no longer be constrained 
to have a value near one. 
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1.1. Production of Top Quarks at the Tevatron 

Quarks are sensitive to both the strong and electroweak forces. The strong force 
is far more powerful than the electroweak force, and thus top quarks are produced 
most often at hadron colliders via the decay of a highly energetic virtual giuon to 
a top quark an d a top antiquark (t). The rate for this process is about 7 pb at 
the Te vat ron where it was first observed by the CDF and D0 collaborations 
in 1995 ! I I I Top quarks can also be produced without their antiparticle partner 
via the electroweak interactionffSHS! In this t-channel virtual W boson 

and a highly energetic bottom quark combine and produce a top quark, or a far 
off-shell s-channel W boson decays to produce a top quark and a bottom antiquark. 
A third process is predicted to exist that occurs via both the s-channe l and t- 
channel, when a top quark is produced together with a W boson.l^SUMlIlS! Charge- 
conjugate processes that produce top antiquarks are expected via the same mecha- 
nisms. Contrary to expectations based on the relative feebleness of the electroweak 
force, the rates for single top quark production are calculated to be quite high, at 
about 2 pb for the t-channel process and 1 pb for the s-channel process) " ' This 
is because higher-order corrections to the tree-level calculations for the t-channel 
process are large. (The rate for tW production is predictecPS to be about 0.3 pb 
and this process is not seen at the Tevatron.) Therefore, one might expect searches 
of the Tevatron data at the D0 and CDF experiments to observe s-channel and 
t-channel single top quark production rather easily, given that the current datasets 
are 50 times larger than those used to discover the top quark in 1995 using the 
pair production mode. The reason that it has been very difficult to observe single 
top quark production is not because the signal rate is too low, but because the 
background processes are over 30 times larger than for top quark pair events. The 
main leading order Feynman diagrams for strong and electroweak production of top 
quarks at the Tevatron are shown in Fig. [TJ 



2. The Search for Single Top Quarks at the Tevatron 

2.1. The Tevatron Collider and the D0 and CDF Experiments 

Fermi National Accelerator Laboratory is the home of the Tevatron Collider, 
a 6.3 km circumference proton-antiproton (pp) accelerator with superconducting 
magnets. This machine began operating in collider mode (versus earlier provision of 
fixed target beams) in 1985 with only the Collider Detector at Fermilab (CDF) in 
operation to record the collisions. The D0 experiment started operation in 1992, at 
which time the collision energy was 1.8 TeV. This was raised in 2001 to 1.96 TeV. 

The two large multipurpose detectors have similar structures H^I^^^ ^ they consist 
of concentric layers of detectors tightly packed around the beampipe at a Tevatron 
collision region to a height of 9 m, with each detector layer having a different 
purpose. The inner layers are composed of silicon microstrip detectors, which 
provide the three-dimensional positions where charged particles pass through. D0 
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Fig. 1. Representative leading order Feynman diagrams for strong production of top quark pairs 
from (a) quarks and (b) gluons, and for electroweak production of single top quarks from (c) 
s-channel "t6" production, (d) t-channel "tqfe" production, and (e) "tW" production. Process (a) 
produces 85% of the tt rate at the Tevatron. Process (e) is not observed there as the cross section 
is too low. The notation "tfe" refers to the tb and charge conjugate tb processes together; "tqb" 
refers to tqb and tqb, and "tW" refers to tW^ and tW^ . 

has an outer tracking detector of scintillating fibers and CDF has an outer gaseous 
wire drift chamber. Each of these tracking systems is encased in a solenoid magnet 
with field-lines parallel to the beampipe; D0's field strength is 2.0 Tesla and CDF's 
is 1.4 T. The magnetic fields curve the tracks of charged particles, which enables 
their transverse momentum to be measured. D0's magnet has a much smaller radius 
(60 cm) compared to CDF's (150 cm) since it was retrofitted inside the calorimeter in 
2001. It therefore does not allow as much space for the tracking detectors, resulting 
in far fewer hits per track, which makes pattern recognition difficult with resulting 
lower track reconstruction efficiency and higher fake track rates. The momentum 
resolution in D0 is poorer because of the shorter track length. 

Outside the central magnets, each detector has layers of calorimetry used to 
measure the energy of particles and to distinguish between electromagnetic particles 
(electron and photons, with or without matching central tracks), and jets (from 
quarks and gluons). D0's liquid-argon/uranium calorimeter is more hermetic and 
covers a larger angular region (pseudorapidity |ry| < 4.2 versus CDF's \r]\ < 3.6, 
where = — ln[tan(0/2)] and 9 is the polar angle), giving better acceptance for 
forward jets and missing transverse energy resolution. 

Outside the calorimeter, D0 has a muon spectrometer with up to four layers of 
tracking detectors and a magnetic field strength of 1.9 T. Only muons (and invisible 
neutrinos) pass right through the calorimeter and their position and momentum is 
remeasured here, with wide pseudorapidity coverage to \ri\ < 2.0. CDF also has 
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several layers of muon detectors outside its calorimeters with coverage to \r]\ < 1.6. 

Each detector has a sophisticated multilevel trigger system used to select inter- 
esting events from the pp collisions, which occur every 396 ns. 

2.2. Search History 

Preparation for a search for single top quark production began at the D0 experiment 
m (five months before the first observation of top quarks in pair production 

mode). D0 published the results of a search using simple kinematic event selection to 
set upper limits on the cross sections in 2000,'^^^'^'^ with a follow-up analysis making 
first use of a multivariate analysis technique, neural networks, to separate signal 
from background in 2001.^^ These analyses used 90 pb"-'^ of data at -^s = 1.8 TeV 
from Run I at the Tevatron (1992-1996), when D0 did not have a silicon vertex 
detector or central magnetic field and b jets were identified via the presence of muons 
in jets from the b decay. The CDF collaboration also searched Tevatron Run I data 
for single top quark production; they published upper limits on the cross sections 
from a cut-based selection in 2002^^ and a follow-up analysis of the same dataset 
using neural networks in 2004.^53] CDp'g analyses had the advantage of being able 
to use secondary vertex &-jet identification using the Silicon Vertex Detector The 
limits on the cross sections for s-channel and t-channel production were about 10-20 
times greater than the predicted values. 

The Tevatron collision energy was increased to 1.96 TeV in 2001, and the beam 
intensity was improved by a factor of about 15 over the course of the run (2002- 
present). The D0 and CDF detectors were significantly upgraded, with the addition 
amongst other things, of the central solenoid magnet to D0 and very large silicon 
tracking systems at both experiments. ^^^^^^ CDF analyzed 160 pb~^ of Run II data 
using a cut-based selection and a maximum-likelihood fit to the variable "lepton 
charge x untagged jet pseudorapidity" and set 95% confidence level (CL) upper 
limits of 13.6 pb on s-channel production and 10.1 pb on t-channel production 
in 2005.1^ D0 analyzed 230 pb~^ of data using neural networks (NN) for signal- 
background separation and a Bayesian binned likelihood calculation using the NN 
output distributions, and set 9 5% CL upper limits of 6.4 pb in the s-channel and 
5.0 pb in the t-channel in 2005.EiE9l 

The next step in the search led to a major improvement. The D0 collaboration 
increased its dataset by a factor of four to 0.9 fb~^, switched the search to tb+tqb 
combined (assuming the SM ratio of the two parts), loosened the selection cuts and 
used an improved 6-jet identification algorithm to increase the signal acceptance 
by 13% over that obtained in the earlier analysis, and applied three multivariate 
methods to separate signal from background to reach 3.4 standard deviation (a) 
significance for a single top quark signal. The measured cross section for tb+tqb 
production combined was 4.9 ± 1.4 pb. The measurement significance represents a 
probability of 0.035% for the background to have fiuctuated up and given a false 
measurement of signal with a cross section of at least 4.9 pb. A significance greater 
than 3(7 is considered in the high energy physics community not to be sufficient for 
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a claim of discovery or first observation (which is set at 5a), but is high enough 
to indicate that "evidence" for the process in question has been seen; it is a very 
exciting threshold to reach. The result was published in 2007 and has received well 
over 100 citations to date!^ Small improvements were made to the analysis an d a 
slightly more significant result (3.6cr) was published in a long paper in 20081^111^ 
The CDF collaboration performed a similar analysis on 2.2 fb~^ of data and reached 
a significance for single top quark signal of 3.7tT, published in 2008. They measured 
the cross section for tb+tqb production to be 2.2 ± 0.7 pb. 

3. Measurement Overview 

After the "evidence" papers, the D0 and CDF collaborations each worked to 
improve their analysis methods and apply them to larger datasets. Both collabora- 
tions select events with one isolated high transverse momentum (px) lepton (electron 
or muon) and large missing transverse energy (^t), indicative of a leptonic T4^-boson 
decay, together with two, three, or four jets. One or two of the jets is identified as 
originating from a b quark, which could be from the top quark decay or produced 
together with it. 

The CDF collaboration has an additional independent search channel^ that 
requires no identified charged lepton, which picks up events lost to electron or muon 
identification inefficiencies, and some r+jets events where the r decayed hadroni- 
cally (but there was no explicit r reconstruction). This is the first time that the 
-^T+jets channel has been used in a single top quark measurement. 

Both collaborations include signal and background events in their lepton+jets 
channels with t^Wb, W^rvr, and r^eiyei^r or t^^u^Vt- Neither includes events 
with T— ^hadrons in the signal acceptance since hadronic r reconstruction is diffic ult. 
A new search based just on this decay channel has recently been completed by D0.ISS1 

After event selection, the signal-to-background ratio is approximately 1:20. The 
backgrounds are mostly M^-|-jets events (especially at low jet multiplicity), followed 
by tt pairs (especially at high jet multiplicity), with small contributions from Z-fjets, 
dibosons {WW, WZ, ZZ), and multijets. Top pairs look like signal when one 
W boson decays leptonically {eiy or fii^) and the other decays hadronically {ud, 
cs, etc.) producing lepton-|-jets events, and also when both W's decay leptonically 
and event reconstruction fails to identify one of the leptons. .Z-l-jets events and some 
diboson processes also mimic single top quark signals when the Z boson decays to 
a pair of leptons {e'^e~ or and one of the leptons is lost, generating fake ]^t- 

Multijet events look like signal in the electron channel when a jet is misidentified 
as an electron and a jet's energy is mismeasured, creating false ]^t- In the muon 
channel, the multijet background comes mostly from bb events where one of the 
6's decays to a muon that travels wide of its jet or the jet is not reconstructed (its 
energy is too low maybe). Example Feynman diagrams for the M^-|-jets and multijets 
processes are shown in Fig. [2] After event selection, D0 has 4,519 lepton-|-jets events 
and CDF has 3,315 lepton-|-jets events and 1,411 ^T+jets events. 
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Fig. 2. Representative leading order Feynman diagrams for the background processes: (a) 14^+jets 
with real fe jets, (b) VF+jets with a light jet mistagged as a 6 jet, (c) multijets with a jet misidentified 
as an electron, and (d) 66 multijets with a nonisolated muon (from a 6 decay) misidentified as an 
isolated one (from a W decay). 



4. Data Samples 

For the observation analysis, D0 uses 2.3 fb~^ of Run II data, collected from 
August 2002 until August 2007. The dataset is split in two parts ("Run Ila" and 
"Run lib") to denote a significant upgrade to the detector, when a new layer of 
silicon microstrip detectors was added around the beampipe.^^ This improved the 
tracking and 6-tagging efhciencies. The Run lib half dataset has higher instanta- 
neous luminosity than the Run Ila half, which lowers the primary vertex identifi- 
cation efficiency and increases track multiplicities in the events. The CDF collab- 
oration uses a 3.2 fb~^ dataset for the lepton-j-jets analysis, which is not split like 
D0's. The -^T+jets analysis uses 2.1 fb~^ of data. 

D0 selects data that pass any reasonable trigger. This requirement is relaxed 
from earlier analyses where only lepton-(-jets triggers were used. The change 
increases the signal acceptance by 16% in the electron channel and 20% in the 
muon channel. It also increases the trigger efficiency to ~ 100%, meaning that no 
correction functions are needed to model trigger turn-on curves for jVIonte Carlo 
events. CDF's triggers include a high-p^ electron trigger, a high-p^ muon trigger, 
and one that requires high I^t and either an energetic electromagnetic cluster or 
two jets. 



5. Signal and Background Simulation 

5.1. Single Top Quark Signal Models 

The singl e to p quark signal is modeled to reproduce next-to-leading order (NLO^ 
kinematici^S! using modified leading order (LO) generators. D0 uses SINGLETOP 
a version of COMPHEI^ adapted by its authors for D0, and CDF uses madevent^ 
based on MADGRAPli^ with their own modifications. In fact, s-channel simulation 
at LO reproduces NLO kinematics without changes,!^ and it is only the t-channel 
that needs such attention. The transverse momentum distribution of the bottom 
antiquark in the 2— >2 process q'b^tq (from SINGLETOP or madevent) after back- 
propagation of the initial-state b to g^bb (from pythiaI^I|) is matched to that of the 
b in the 2^-3 process q'g^tqb (from SINGLETOP or madevent). Simulated events 
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from the 2^2 calculations are kept iiprib) < 10 GeV (D0) or < 20 GeV (CDF) and 
ones from the 2^3 process are used iiprib) > 10 GeV (D0) or > 20 GeV (CDF). 
The 2— >2 process is scaled hy a K factor to make the rates at the cut-off point match: 
K — 1.21 for D0.'^ There is another t-channel subprocess where a gluon prod uces 
a ti pair and the t combines with a radiated W boson to produce a This 
subprocess has a cross section only a few percent of the tqb g—^'bb subprocess, with 
a large negative interference between the two subprocesses. Both D0's SINGLETOP 
and CDF's MADEVENT models include the g—^'ti subprocess and the interference. 
The models also both have finite widths for the top quark (sa 1.5 GeV) and W boson 
(sa 2.0 GeV). In all signal (and tt background) models, the top quarks and their 
daughter W bosons are decayed at the time of production, before later processing 
with PYTHIA, so that all spin properties of the top quarks are preserved in the 
angular correlations of the final decay products. D0 uses a top quark mass of 
170 GeV for signal simulation, CDF uses 175 GeV. Each value was chosen at a time 
when it was close to the world average value, which shifts slightly once or twice a 
year as the measurement is improved. This difference does not have a significant 
effect on the final results. 

For modeling the parton kinematics in the protons and antiproton s, D uses the 
CTEQ6M next-to- leading-order parton distribution functions (PDF). CDF uses 
the CTEQ5L leading order PDFs.l^The scale for the s-channel model is M^^^ 
(D0) or s (CDF), and for the t-channel model, (Aftop/Z)^ (D0) or i+M^^p (CDF) 
are used. D0's values in SINGLETOP are chosen to make the LO and NLO cross 
sections be the samjSHIg^jjj CDF's values in madevent are chosen to closely match 
those used in the NLO ztop event generator.lSSl Both collaborations use PYTHiA to 
add the underlying event from the pp interaction, the initial-state and final-state 
radiation, and to hadronize and fragment the final state quarks and gluons into 
jets. They also both use tauola to decay tau leptons.'''^ For S-hadron decay, D0 
uses EVTGEN from the BaBar experiment!^ and CDF uses QQ from the CLEG 
experiment Events from multiple primary vertices are overlaid onto the primary 
MC event with a Poisson multiplicity distribution in order to simulate the high 
instantaneous luminosity. D0 uses zero-bias data events and CDF uses MC events 
generated with pythia. The mean number of pp collisions per bunch crossing for 
this dataset is two for Run Ila and five for Run lib. 

5.2. Background Models 

All background components except multijet events are simulated using Monte Carlo 
models. Both collaborations use the ALPGEN event generator,!^ which has leading- 
log (LL) precision, coupled to pythia to simulate M^-l-jets events, including full 
modeling of events with massive b and c jets. The version of ALPGEN used includes 
parton-jet matchin^^ to avoid double-counting some regions of jet kinematics. The 
samples are generated in the following sets (Ip — light partons): M^-|-01p, VK-|-llp, 
W-f21p, W-t-31p, iy-f41p, VK+>51p (this set includes VF-|-single massless charm); 
W'cc-I-Olp, M^cc-l-llp, Wcc+2lp, Wcc+>3\p; and Wbb+0\p, Wbb+l\p, Wbb+2\p, 
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Wbb+>3\p, which are summed weighted by the alpgen LL average cross section 
for each subset and then spht to obtain Ty+2jets, VF+3jets, and VF+4jets event 
sets. D0 uses this same version of alpgen and generation method to simulate tt 
events; CDF uses pythia, which only adds extra jets through initial-state and final- 
state radiation (not from the hard scatter) , but this is not critical since they do not 
include events with four jets in their analyses. Smaller backgrounds are modeled 
using ALPGEN and PYTHIA (D0) and pythia (CDF). 

Some more details of the VF-|-jets modeling are in order, since this background 
is critical in the most important 2-jets analysis channels. D0 uses the CTEQ6L1 
parton distribution functions and CDF uses CTEQ5L. Both collaborations use scale 

~ + ^ (recommended by the ALPGEN authors), where tot is the trans- 
verse mass defined as to^ = to,^ (parton) (parton) and the sum ^ extends to 
all final state partons (including tlic heavy quarks, excluding the W decay products). 
For Wbb and Wcc samples, TO(parton) = rrib or rric- For VF+light jets samples, the 
jets are treated as massless with TO(parton) ~ GeV. 

The multijet background is modeled by D0 using data with much looser lepton 
selection than for signal selection. They select events that pass all final cuts in the 
electron channel cixccpt that the electromagnetic object fails the electron id(uitifi- 
cation cuts, including not requiring a track matched between the primary vertex 
and energy cluster in the calorimeter. This is a very loose selection, with a ten-fold 
increase in statistics compared to that used in the earlier evidence analysis (when 
a matching track was required). The reason for this change is to ensure sufficient 
statistics after b tagging to make a proper measurement of this background. In 
D0's muon multijet data, the muon is not required to be isolated from a jet. In 
the previous analysis, a partial isolation requirement was applied, and removing 
this criterion increased the muon multijet background statistics by a factor of 
ten. In the electron channel, the ratio of electron+jets and photon+jets events 
is used to determine a reshaping weight as a function of the electron px to make 
the background model sample better match the actual miiltijct events remaining 
after signal selection. The function boosts the fraction of low-energy events. In the 
muon multijet background dataset, any jets close to the muon are removed and the 

is recalculated in order to make the jets reproduce those in the signal data. 
No kinematic reshaping is needed. To obtain background samples that model the 
multijets backgrounds, the samples as described (with an electron or muon that fails 
final identification criteria) are scaled by functions that represent the probability 
for a failing lepton to pass the identification cuts. 

CDF model the multijets background in the lepton-t-jets channels using a data 
sample with below the signal selection threshold of 25 GeV, and project it into 
the high-^T signal region using a fit to the shape of the distribution. They 
model the dominant multijets background in the _^T+jcts channels using pretagged 
data that pass all selection cuts together with a tag-rate matrix calculated using an 
independent .^T+jets dataset. 
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5.3. Detector Simulation 

After the MC samples are generated, they are processed through code that models 
the geometry and material of each subdetector systemfS? and then through further 
code that generates digitized signals modeled to resemble those from the subde- 
tectors. After this, the MC events look very like those from data and both are 
processed through event reconstruction software to identify the correct primary 
vertex, leptons, jets, and so on, ready for further analysis. 

5.4. Background Normalization 

The tt, Z+jets, and diboson backgrounds are normalized to (N)NLO theory cross 
section values, with each collaboration using a tt cross section appropriate for the 
top quark mass chosen for its analysis. 

Before the W^+jets backgrounds can be normalized, corrections are applied to 
modify the leading log ALPGEN fractions of heavy flavor jets (c, cc, and bb) to 
account for missing higher order contributions and make them match what is seen 
in data. The D0 collaboration scales the Wjj, Wcj, Wcc and Wbb subprocesses 
to their NLO predictions (where j = u,d,s,g) using K' = ctnlo/cll and i^HF = 
^NLo/'^'NLO factors. This is also done for the small Z+jets background. The K' 
factor is 1.30. K^^ = 1.47 for Wcc and Whh, 1.67 for Zee, and 1.52 for Zbb. 
These factors come from calculations using the NLO MC event generator mcfm.^^ 
For Wcj, i^Hp — 1.38, from a data measurement that agrees with NLO theory.!^ 
The important Wbb and Wcc subprocesses are then checked against data after 
b tagging and an empirical correction of 0.95 ± 0.13 is applied to get good data- 
background agreement. This factor accounts for contributions to the heavy flavor 
rate from Feynman diagrams at higher order than NLO not included in mcfm. 
The uncertainty on the empirical correction factor is the third largest component 
of the total systematic uncertainty on the cross section measurement. It includes a 
9% statistical contribution from the variation of the correction when measured in 
different analysis channels (e, /i, 1-tag, 2-tags, Run Ila, Run lib), 8% from the Wcj 
iCgp factor uncertainty (10%), and 7% from the uncertainty on the assumed single 
top cross section (40%, based on the difference between D0 and CDF's published 
evidence measurements). CDF compresses these three steps into one, and, from a 
data-background comparison after h tagging in T^-l-ljet events, applies a scale factor 
of 1.4±0.4 to Wcj, Wcc and Wbb relative to the LL Wjj process. Converting D0's 
scale factors to allow a comparison gives 1.47 x 0.95 = 1.40, so things are consistent. 

The M^-l-jets and multijets backgrounds are normalized to data before h tagging. 
D0 normalizes the sum of the two backgrounds using an iterative Kolmogorov- 
Smirnov procedure with the pT(lepton), ]^t, and W boson transverse mass Mt{W) 
variables. For the multijets background, CDF uses a fit to the distribution at 
low I^T extrapolated to high and does not anticorrelate the two components. 
After subtracting all other background components, they normalize the W-|-jets 
background to the number of data events. 
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5.5. Model Corrections 

Both collaborations need to correct the MC efficiency to reproduce the efficiency 
of the detector, event reconstruction, and particle identification. This is done for 
electrons, muons, and jets. All MC events are reweighted to make the instanta- 
neous luminosity distribution (number of overlaid zero-bias events from multiple pp 
collisions) match that observed in the data. D0 also reweights the muon pseudora- 
pidity 77 distribution in M^-l-jets events to better model the efficiencies of the regions 
between the central and forward muon systems. 

For M^-f jets events, both collaborations find the pseudorapidity distributions of 
the jets from the ALPGEN simulation do not match data well (there are presumed to 
be slightly dif fere nt Feynman diagrams in the calculation compared with e.g., the 
SHERPA modef^ which has wider jet rj distributions). The ALPGEN distributions are 
too narrow, and empirical reweightings are applied to these distributions (77(jetl), 
77(jet2), A0(jetl,jet2), and A77(jetl, jet2) for D0, similarly for CDF) to make the 
background model match data before b tagging. Since D0's reweighting uses binned 
functions derived in each analysis channel separately, it also takes account of imper- 
fections of the detector model in the intercryostat regions. 

6. Event Reconstruction and Particle Identification 

6.1. Primary Vertices 

There are several primary vertices in each event, on average, because of the high 
collision rate leading to multiple interactions. They are reconstructed at D0 by first 
clustering tracks according to their positions along the beamline, then the location 
and width of the beam is measured and used to refit the tracks. Finally, each 
cluster of tracks is associated with a vertex, and the one with the lowest probability 
of coming from a zero-bias collision is chosen as the primary vertex for that event. 

6.2. Electrons 

Electrons are defined as clusters of energy deposited in the electromagnetic section 
of the calorimeter that are consistent in shape and other properties with an electro- 
magnetic shower. The cluster must be isolated from other energy in the event and 
have a track that points to it from the primary vertex. 

6.3. Muons 

Muons are identified by matching reconstructed tracks from the outer muon system 
to ones from the inner tracking system. The match is made spatially and (at D0) 
in transverse momentum and muon charge. Muons must be isolated from nearby 
tracks and jets to show they are from W (or Z) boson decay and not from heavy 
flavor {b or c) decay inside a jet. 
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6.4. Jets 

Jets are reconstructed using energy deposited in the calorimeters. D0 applies the 
midpoint cone algorithinS2l in (y, 0) space, where y is the rapidity and is t he 
azimuthal angle, and the cone radius is 0.5. CDF uses a clustering algorithn|S4l in 
(t?, 0) space with a cone radius of 0.4. There are several requirements on where the 
energy is deposited to reject noisy jets (whose energy would be mismeasured) . The 
energy of each jet is corrected if there is a muon in the jet, to account for energy 
taken away by that muon and associated (invisible) neutrino from a heavy quark 
decay. The jet's energy is also corrected using the jet energy scale calibration to 
ensure that the absolute value is correct. For mos t jet s (i?T: the uncert aint y on 
the jet energy scale is between 1% and 2% for Dd^Sland it is 3% for CDF.ESl 

6.5. Missing Transverse Energy 

The missing transverse energy is computed by adding up vectorially the transverse 
energies in all cells of the electromagnetic and fine (inner) hadronic calorimeters 
(for D0). Cells in the coarse (outer) hadronic calorimeter are only added if they 
form part of a good jet. This quantity is corrected for all the energy corrections 
applied to other objects in the event and for the momentum of isolated muons. 
CDF's computation of is similar. 

7. Event Selection 

The analyses start out with very large numbers of events in data and MC signal 
and background samples. For example, D0 uses data skims with one electron or one 
muon in them, which contain 1.2 billion events, and 85 million MC events. From 
these samples, the analyses first select events that look like signal and reject events 
that do not. That is, each collaboration devises selection cuts designed to keep as 
many MC signal events as possible while rejecting as much background as they can. 
The D0 collaboration chooses to maximize signal acceptance while allowing for a 
slightly worse signal-to-background ratio, whereas the CDF collaboration chooses 
tighter selection cuts that produce a lower signal acceptance but better signal- 
to-background ratio. Thus, although D0 starts the analysis with about 30% less 
integrated luminosity to analyze than CDF, they end up with more expected signal 
events in the lepton-|-jets channel, and a similar number in total when considering 
also the -^T+jets channel after all selections are applied. D0 pursues this strategy 
because their studies show that the overall sensitivity of the analysis is proportional 
to the signal acceptance. 

7.1. Kinematic Cuts 

The kinematic cuts used in the analyses are shown in Table[T] For simplicity, only the 
cuts in the channels with exactly two jets are shown, since these channels contribute 
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most to the analysis sensitivity. For the lepton+jets analyses, both collaborations 
also use events with three jets, and D0 also uses events with four jets. These 
channels have slightly harder cuts for electron px, ^t, and total transverse energy 
Ht than those shown in the table, to reject the higher multijets background. 



Table 1. Kinematic selection cuts used in the 2-jets analysis channels to identify 
events that look like single top quark signal and reject backgrounds. 





D0's Selection 
Lepton+2Jets 


CDF's Selection 
Lepton+2Jets _^T+2Jets 


Electron 


PT > 15 GeV 


PT > 20 GeV 






kl < 1-1 


h\ < 1-6 




Muon 


PT > 15 GeV 


PT > 20 GeV 






kl < 2.0 


\V\ < 1-6 




Neutrino 


> 20 GeV 


^T > 25 GeV 


^T > 50 GeV 


Jetl 


PT > 25 GeV 


PT > 20 GeV 


PT > 35 GeV 




\ri\ < 3.4 


\v\ < 2.8 


\ri\ < 0.9 


Jet2 


PT > 15 GeV 


PT > 20 GeV 


PT > 25 GeV 




|r;| < 3.4 


\ri\ < 2.8 


Ivl < 2.8 


Total Et 


HTQets,e,^T) > 120 GeV 








HT(jets,/i,_^T) > 110 GeV 







Motivation for D0's choice of lower transverse energy thresholds and wider jet 
pseudorapidity distributions than used in, for example, a top pairs measurement 
can be seen in Fig. [3] for the t-channel single top quark process. The light quark 
that radiates the W boson has a very wide rj distribution in both the forward and 
backward directions (shown by the red histograms in the plots). This is a very strong 
signature for single top quark production that will be used as a powerful variable 
to separate signal from background. The soft b produced from the giuon splitting 
has an even wider rj distribution (dark green histograms) and low p^, and finding 
this jet increases the double-6-tagged signal acceptance. 
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Fig. 3. Distributions of (a) the transverse momentum and (b) the signed pseudorapidity of partons 
in t-channol single top quark events, from the COMPHEP-SINGLETOP simulation. 
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There are additional selection cuts not shown in Table [T] In the lepton+jets 
channels, events are rejected if there is a second isolated lepton, which rejects 
dilepton decays of tt, Z+jets, and diboson events. D0 has an upper cut on 
of 200 GeV to reject misreconstructed events. Both collaborations throw out events 
with low just above the cut thresholds when it is aligned or back-to-back with 
one of the objects in the event, indicative of a misreconstructed event. The primary 
vertex must be clearly identified and near the center of the detector, and the lepton 
must originate from it. The regions between D0's central and end calorimeter 
cryostats are tricky to instrument and model accurately, and if the leading jet in the 
muon analysis channel points to this region, the threshold on it is raised to 30 GeV. 
Finally, D0 has cuts on muon track curvature significance designed to reject events 
where the muon has been misreconstructed. In CDF's ^T+jets analysis, a neural 
network with 15 input variables is trained to separate the multijets background 
from signal and a cut is placed on the output distribution. 

After the kinematic event selection, D0's background samples retain 4 million 
MC events and 0.8 million multijet data events, and there are 0.5 million single 
top quark signal MC events. The signal data contain 114,777 events, with predicted 
background components: Wjj = 71%, Wcj = 6%, Wcc = 6%, Wbb = 3%, Z+jets 
= 6%, dibosons = 2%, tt = 1%, and multijets = 5%. The expected single top quark 
signal is tb = 0.13% and tqb = 0.26%, with a signal-to-background ratio for tb+tqb 
of 1:260. Clearly, an additional method is needed to select events for the analyses 
to stand any chance of finding the single top quark signal. 

7.2. Heavy-Flavor Jet Tagging 

The most powerful part of event selection is the identification of jets that originate 
from b quarks. The algorithms use the long decay time of the B hadrons (mean 
lifetime ~ 1.5 x 10^^^ s) which results in detached secondary vertices in the jets (> 
1 mm between the primary and secondary vertices) , together with other information 
about the tracks to find 6 jets. The tagging algorithms are applied directly to jets in 
data and to most MC events at CDF, and are modeled with tag-rate functions for 
MC events at D0 together with taggability-rate functions to reproduce the detector 
geometric acceptance and operating efficiency. For W-t- light jets MC, CDF uses tag- 
rate functions measured in multijets data. 6-jet identification is implemented at D0 
by combining all the track and vertex information using a neural network.^^ZI CDF 
uses the significance of the decay length of the secondary vertex in the (r, (p) plane 
for the lepton-|-jets and -^T+jets channels,^^ and also a jet probability algorithm 
in the ^T+jets channel.l^ Depending on where a cut is put on these variables, one 
can define looser or tighter b tagging, where "loose" means higher probability to 
tag a b jet (58% for b jets within D0's Silicon Microstrip Tracker fiducial geometric 
acceptance) with associated higher probability to mistag a non-6 jet (17% for charm 
jets and 1.8% for light quark and gluon jets), and "tight" means a lower 6-tag 
probability (47% for b jets at D0) with associated lower fake tag rates (10% for 
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cjets and 0.5% for light jets). D0 requires one tight-tagged jet (and no loose-tagged 
jet) for its single-tagged analysis channels, and two loose-tagged jets for its double- 
tagged channels. CDF has one set point for both single-tagged and double-tagged 
lepton-fjets channels, with efficiencies of 50% (6), 9% (c), and 1% (j) for fiducial 
jets within the Silicon Detector tracking system. 

7.3. Analysis Channel Separation 

To improve the sensitivity of the measurement, both D0 and CDF split their 
datasets into independent channels using the jet multiplicity (2, 3; and 4 for D0), 
number of 6-tagged jets (1 or 2), lepton flavor (D0 only, electron or muon), trigger 
type (CDF only, lepton, for muon-|-jets) and data-collecting period (D0 only. 
Run Ila and Run lib), giving 24 independent lepton-|-jets analysis channels for 
D0 and eight for CDF. CDF's _^T+jets channel with no isolated lepton is split 
by the number and type of b tags (one SecVtx-tagged jet, two "Sec Vtx" -tagged 
jets, and one "Sec Vtx" and one "JetProb" -tagged jet). Measurements are made in 
each channel and combined at the end of the analysis. The signal-to-background 
ratios vary from 1:10 (2-jets/2-tags) to 1:37 (4-jets/2-tags) for D0, with the most 
important 2-jets/l-tag channels having S:B = 1:20. CDF's channels have S:B = 1:15 
in the 2-jets channels (1-tag and 2-tag combined), S:B = 1:23 in the 3-jets channels, 
and S:B = 1:23 in the ^T+jets channels. 

8. Signal Acceptances and Event Yields 

After all event selections have been applied, the signal acceptances (percentage of 
total cross section) for D0 are (3.7 ± 0.5)% for the s-channel tb process and (2.5 ± 
0.3)% for the t-channel tqb process. The t-channel process has a lower acceptance 
because the b jet has low transverse momentum and is difficult to identify. CDF's 
signal acceptances in the lepton-fjets channels are 2.7% for the tb process and 1.8% 
for the tqb process. These values are lower than D0's because of the more restrictive 
trigger requirements, tighter kinematic selection, and tighter b tagging in the double- 
tagged channel. In addition, CDF has the ^T+jets channel with a signal acceptance 
of 1.1% for tb+tqb combined. 

Table [2] shows the numbers of signal and background events expected, and the 
numbers of data events found. For simplicity here, all analysis channels have been 
combined. Four notes to understand the table are in order: (i) Remember that D0 
uses rrit — 170 GeV and CDF uses 175 GeV for single top signal and tt background, 
with associated higher theory cross sections for the lower top quark mass. They 
each also use different theory calculations for these values: for single top , D0 uses 
Kidonakis 2006 values of 1.12 ± 0.05 pb (tb) and 2.34 ± 0.13 pb (tqb)^ and for tt 
they use the Kidonakis and Vogt 2003 value of 7.91!l^° gi[ pb (where the tt uncertainty 
includes a component for the top quark mass). CDF uses for single top the Harris 
et al. 2002 values of 0.88±0.12 pb {tb) and 1.98l!5;22 pb {tqb)^a.TLd for tt they use 
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the Bonciani et al. 1998 value of 6.70 ± 1.32 pb.l^ Thus, direct comparison of the 
signals and tt backgrounds needs one or other experiment's numbers to be rescaled 
to be valid, (ii) D0's analysis includes channels with four jets and CDF's does not, 
so the fraction of tt events expected by D0 is higher than at CDF when showing 
yields with all channels combined. However, when one considers each jet multiplicity 
channel separately, then the relative fractions of W^+jets, ti, etc. are very similar 
between the two experiments, (iii) CDF's _^T+jets channel VF+jets yield does not 
include Wjj where j = a light jet. (iv) CDF's -^T+jets channel multijets yield 
includes also the Wjj events. 



Table 2. Numbers of events after all selections have been applied. See comments in the 
text on how to compare the columns. 





D0's Yields 


CDF's Yields 




Lepton+Jets, 2.3 fb~^ 


Lepton+Jets, 3.2 fb"! 


_^T+Jets, 2.1 fb-i 


tb+tqb signal 


223 ± 30 


191 ± 28 


64 ±10 


VK+jets 


2, 647 ±241 


2, 204 ± 542 


304 ±116 


Z+jets, dibosons 


340 ± 61 


171 ± 15 


171 ± 54 


tt pairs 


1,142 ±168 


686 ± 99 


185 ± 30 


Multijets 


300 ± 52 


125 ± 50 


679 ± 28 


Total prediction 


4, 652 ±352 


3, 377 ± 505 


1, 403 ± 205 


Data 


4,519 


3,315 


1,411 



Figure |4] shows the reconstructed W boson transverse mass distributions from 
D0 (all channels combined) and CDF (lepton+2jets channels). The transverse mass 
is defined as: Mt{W) = Mt{1, v) = ^2pt{1)^t{1 ~ cos{(j){l) - (jji^r)))- 



D0 2.3 fb ^ CDF 3.2 fb ' 




W Boson Transverse Mass [GeV] w Boson Transverse Mass [GeV] 



Fig. 4. Distributions of the W boson transverse mass for (a) D0, with all analysis channels 
combined, and (b) CDF, with all lepton+2-jets/l-tag channels combined. 
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9. Background Model Checks 

Any analysis of the data is only valid if the background models reproduce the data 
in all variables used for event selection and to separate signal from background. In 
addition to checking the background model agreement with data for these distri- 
butions for every particle in each analysis channel, extensive cross checks using 
other data samples have been performed to ensure the separate components of the 
background model are accurately modeled. Samples that pass all selection cuts are 
used before b tagging to certify the shape of the PF+light jets background model. 
The VF+heavy flavor background model's agreement between data and background 
model in both shape and normalization is checked using a sample with exactly two 
jets, with one b tagged, and i?T(^,-^T, jets) < 175 GeV in D0's analysis. Finally, 
the tt background is validated in both normalization and shape using data and 
MC samples with four jets, one or two b tags, and, for D0 only since they have 
softer object Et requirements, Ht , , jets) > 300 GeV. Many distributions are 
checked using these three cross-check samples and good agreement between data and 
background model is found. Figure [5] shows the transverse mass of the W boson as 
an example. 



Pretagged Cross-check Sample IV+Jets Cross-Check Sample H-Pairs Cross-check Sample 




W Boson Transverse Mass [GeV] W Boson Transverse Mass [GeV] W Boson Transverse Mass [GeV] 



Untagged Cross-Check Sample (f-Pairs Cross-Check Sample 




D 50 100 150 — ■ 

50 100 150 

W Boson Transverse Mass [GeV] ^ g^^^^ Transverse Mass [GeV] 



Fig. 5. Distributions of the W boson transverse mass for several cross-check samples: (a) D0's 
pretagged events, with all analysis channels combined, (b) D0's VF-l-jets cross-check sample, 
(c) D0's tt pairs cross-check sample, (d) CDF's two-jets untagged sample, and (e) CDF's tt 
cross-check sample. 
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10. Systematic Uncertainties 

The uncertainties in all searches are dominated by the statistical uncertainty from 
the size of the data sample. However, once there is enough data to observe and 
measure a signal, then systematic contributions to the total uncertainty become 
important. The total uncertainty on the cross section measurement by D0 is ±22%, 
and for CDF it is +29%, -24% in the lepton+jets channel, +52%, -46% in the 
.p^T+jets channel, and +26%, —22% with these channels combined. The contri- 
bution from the data statistics in D0's measurement is ±18%, leaving ±13% 
from systematic components. Normalization systematic uncertainties and shape- 
dependent systematic uncertainties are considered separately for each signal and 
background source in each analysis channel. The overall background uncertainty 
varies between 7% and 15% for the individual channels in D0's measurement. 
Shape and normalization uncertainties combined result in 20% uncertainties on the 
background model for single-tagged channels and 40% uncertainties on background 
for double-tagged channels, for events most like; signal. The uncertainties on the 
background model for events most like background about 10% for single-tagged 
channels and 15% for double-tagged channels. D0 measures systematic uncertainty 
contributions from 23 different sources. Others were considered but found to be 
negligible. The largest source of systematic uncertainty comes from the b-TD tag-rate 
functions, including both normalization and shape parts, followed by the jet energy 
scale calibration (also normalization and shape), and the heavy-flavor correction 
fa(;tor for the Wbb and Wcc fractions in the MC model. Smaller contributions (in 
descending order) come from the integrated luminosity, the jet energy resolution, 
initial-state and final-state radiation, 6-jet fragmentation, the tt pairs cross section, 
and lepton identification. CDF's analyses include normalization uncertainty terms 
from 16 sources, and shape terms from a subset of nine sources. The most important 
ones are the jet energy scale, the event detection efficiency, and the Wbb, Wcc, and 
Wcj scale factor. 

11. Signal- Background Separation 

The sensitivity to observe a signal with a large background is greatly improved 
by finding a variable that has a different shape for signal than for background. 
One can then keep only events in the maximal-signal region and measure a cross 
section by counting events if there is enough data, or, as in the case of single top 
quark production with only a few inverse femtobarns of data, one can perform 
a binned likelihood calculation comparing the shapes of the expected signal and 
background to data across the full distribution to further improve the sensitivity. 
Since the kinematics of single top quark events lie between those of the dominant 
lower-energy W+jets and higher-energy ti backgrounds, it is not possible to find a 
single simple variable with which to perform this calculation. Hence, D0 and CDF 
each combine many variables using several different methods to increase the signal- 
background separation power. D0 uses three discrimination methods and CDF uses 
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five in the lepton+jets channel, one in a separate s-channel tb search, and one in 
the ^T+jets channel, which are briefly described here; more details are available 
elsewhere.™™ 

11.1. Discriminating Variables 

D0 uses 97 discriminating variables in its final analysis, chosen from a much longer 
list to include only those variables with a different distribution for signal and at 
least one of the background components j ^^ ^ ^^^^ l l^^^^^ l l^ and also to have good 
agreement between the shape of the background sum and data. The variables fall 
in five categories: object kinematics, event kinematics, jet reconstruction, top quark 
reconstruction, and angular correlations. The most powerful ones for separating 
single top quark signal from the VF+jets and tt backgrounds in each category are 
shown in Table [3] 

Table 3. 30 of the 97 variables used by D0 that have the best separation between 
the single top quark signal and 14^+jets or tt pairs. 



Variable Type 



Separate Single Top from: 
VF+Jets tt Pairs 



Object 
Kinematics 



Event 
Kinematics 



Jet 

Reconstruction 



Top Quark 
Reconstruction 



Angular 
Correlations 



PT (jet2) 
p^'(jetl,tag-^i) 
_B(lightl) 

M(jetl,jet2) 
Mt(W) 
Ht (lepton,^T ,jetl,jet2) 
HT(jetl,jet2) 



Width^(jet2) 
Width,, (jct2) 



Mtop{W/{Sl),tagl) 
Aftop(Vl/{S2),tagl) 

cos(lightl,lcpton)btaggcdtop 
A(/i{lepton,^T) 
Q(lepton) X »7(hghtl) 



pj'(notbest2) 
PT(jet4) 
PT(light2) 

M(alljets - tagl) 
Centrality(alljets) 
M(alljets - bestl) 
-//^(alljets — tagl) 
-ffy(lepton,_^ji, alljets) 
A'/(alljets) 

Width,, (jet4) 
Width^(jet4) 
Width^(jet2) 



cos(leptonbtaggcdtop ,btaggedtopcM ) 
Q(lepton) X r7(lightl) 
Ai?(jetl,jet2) 



Some comments on the notation are in order. The numbering n of jetn, tagn, 
lightn, etc. refers to the transverse momentum ordering of the jets, 1 is the 
highest PT jet of that type of jet, 2 is the second-highest pr jet, and so on. "tag" 
means a 6-tagged jet. "light" means an untagged jet (it fails the 6-tag criteria), 
"best" means the jet which, when combined with the lepton and missing trans- 
verse energy, produces a top quark mass closest to 170 GeV (the value at which 
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D0's analysis is performed), "notbest" means any jet that is not the best jet. 
"alljets" means inchide all the jets in the event in the global variable, px is the 
transverse momentum. E is the particle's energy. Q is the particle's charge. Ht 
is the scalar sum of the particles' transverse energies. M is the invariant mass 
of the objects. Mt is the transverse mass of the objects, p'rp^ is the transverse 
momentum of the muon closest to the jet relative to that jet. SI and S2 are the 
two solutions for the neutrino longitudinal momentum when solving the W boson 
mass equation, and SI is the smallest absolute value of the two (the preferred 
value). AAfj™p" is the difference between 170 GeV and the reconstructed top quark 
mass using the jet and neutrino solution that make the mass closest to 170 GeV. 
Ai?(objectl,object2) = ^A0(objectl, object2)2 + A?7(objectl, object2)2. Finally, 
subscripted text in the cosines indicates the rest frame in which to measure the 
variable in question. "CM" is the center of mass frame of the whole final state. 

The CDF collaboration uses fewer variables with their discriminants, but they 
have one very powerful variable not developed by D0: the jet flavor separator.!^ 
This takes all parameters that describe the tracks in fe-tagged jets and combines 
them using a neural network to calculate a probability that the jet is a bottom jet, 
or a charm or light quark or gluon jet. This variable increases the signal-background 
separation sensitivity by 15%. 



11.2. Boosted Decision Trees 

A decision tred^ applies sequential cuts to the events but does not reject events 
that fail the cuts. The choice of variables and cuts at each level of the tree is 
made by training the trees on large sets of signal and background MC events. 
BoostinJ^ averages the results over many trees and improves the performance by 
about 20%. D0 pioneered the use of boosted decision trees (BDTs) to separate 
signal from background in the single top search in 2006.1S21EI] They use custom 
code with 64 input variables from the total list of 97, and 50 boosting cycles 
with a separate set of BDTs for each of the 24 analysis channels .l^^lMl ^pj^g same 
variables are used in every analysis channel, since the BDTs ignore ones that do 
not show sensitivity in any particular channel. With BDTs, there is also no need 
to split the signal and background samples by subcomponent to improve the sensi- 
tivity (which is beneficial with traditional neural network^^l^ , since they handle 
the varying kinematics without problem. The CDF collaboration also uses BDTs, 
recently included in the tmva package in ROOT.Uffl They use 22 v ariables for 2-jet 
events and 29 for 3-jet events, with 400-600 boosting cycles.^^^ They train four 
sets of BDTs in total, since they combine electron and muon channels and the 
two trigger types. After boosting, the distributions of both signal and background 
are highly centralized between zero and one. In order to avoid using bins in the 
final calculation with predicted signal or data but no predicted background, D0 
transforms its output distributions (from all three discriminant methods, not just 
BDTs) to ensure that every bin has at least 40 background events. This transfor- 
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mation clusters the background events near zero and the signal events near one, 
and avoids instabilities in the final cross section measurement. 

11.3. Traditional Neural Networks 

D0 made the first particle search using neural networks (NNs) to separate signal 
from background in 2001 !^ T he ty pe used were multilayer feed- forward percep- 
trons from the MLPFIT packageM^^ CDF uses NNs in the observation analysis for 
the lepton+jets c hann els with 14 input variables, and for the ^T+jets channels 
with 11 variables The netwo rks i n the lepton+jets channels come from the 
commercial NEUROBAYES packageU^ Despite its name, it is not a Bayesian NN 
package as described in the next subsection. The networks are trained on the same 
events as used with the BDTs to obtain the weights between nodes and thresholds 
at the nodes. An independent set of events is used to test the networks after each 
training cycle to avoid overtraining. Since NNs use all input variables (unlike BDTs, 
which ignore ones not found to be useful) , care must be taken not to include variables 
with insufficient separation power uncorrelated from the other variables, otherwise 
noise is introduced into the system and the separation can decrease. This is the 
reason why far fewer variables are used with NNs than with BDTs. 

11.4. Bayesian Neural Networks 

D0 introduced the use of Bayesian neural networks (BNNsf^^^ for signal- 
background separation in the 2006 single top evidence analysis. ^'^'^"'^ Like tradi- 
tional NNs, a short list of input variables must be chosen, and D0 uses the rulefit 
packag e! ^^^ 1 to select between 18 and 28 variables per analysis channel. The networks 
have 20 hidden nodes. The Bayesian part of this technique is to average overmany 
networks in each channel using the Markov- Chain MC sampling technique.^^^ D0 
uses 300 networks in each of the 24 analysis channels, with the final resu lt in each 
channel being taken from an average of the last 100 networks in the chain.^l^ This 
averaging process makes the discrimination insensitive to details of which events 
are used in training, so it is not possible to overtrain the networks, although closure 
tests are performed using independent events to verify convergence. It is also not 
necessary to split the signal and background components with separate networks 
for optimal separation. The averaging process also improves the signal-background 
separation, since it is not dependent on the choice of starting parameters for the 
weights between nodes or thresholds at the nodes, which can lead to solutions at 
local minima which are not optimal without the averaging. 

11.5. Matrix Elements 

Both D0 and CD F use matrix elements (MEs) to separate signal from 
backgro und! ^ D0 developed the method to measure the top quark mass 
in 2004,1^ and was the first to apply them to signal-background separation in 
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the 2006 single top evidence analysis PQlEl] The matrix elements correspond to 
signal and background probability densities. D0 calculates matrix elements for three 
signal processes in the 2-jets channel and five in the 3-jets channel, together with 
eight background processes in the 2-jets channel and three in the 3-jets channel. 
The proton and antiproton are modeled using parton distribution functions and 
detector resolutions are taken into account using jet resolution transfer functions. 
The calculations are extremely CPU-intensive, and take many months to complete. 
To improve the sensitivity, D0 splits the analysis into events with Ht < 175 GeV 
(mainly W^-l-jets background) or H-r > 175 GeV (mainly tt and hard W-t-jets 
background) . 

11.6. Likelihood Functions 

CDF has an analysis t hat u ses likelihood functions for signal-background separation 
with tb+tqb as signaL^mi They also search separately for only the s-channel tb 

1 19 

process, using different likelihood functions and input variables.^ Likelihoods 
are much simpler than NNs, they need no training on signal or background event 
samples, and do not take correlations between the variables into account. In the 
2-jet channels, CDF's likelihoods combine seven variables, including two powerful 
ones: the logarithm of the matrix element, and the jet flavor separator. In the 3-jet 
channels, 10 variables are combined. 

11.7. Combining the Discriminant Outputs 

The measurements from each discrimination method are correlated, but by less 
than 100%, and the discriminant outputs may thus be combined to improve the 
precision of the final measurement. D0 measures the correlation between its three 
analysis methods (BDT, BNN, ME) using an ensemble of pseudodatasets containing 
background and SM signal, and finds the correlation to be 74% between BDT and 
BNN, 60% between BDT and ME, and 57% between BNN and ME. To combine 
the three measurements in each of the 24 analysis channels, D0 uses an additional 
set of BNNs, each with three inputs and six hidden nodes. CDF uses an innovative 
method to combine i ts lep ton-|-jets measurements: neuro-evolution of augmenting 
technologies (NEAT). EST This is a method for evolving neural networks with a 
genetic algorithm. Evolution starts with small simple networks that become increas- 
ingly complex over sequential generations. The networks are trained to give the best 
expected p- value (significance) for the result. This is unlike how traditional NNs are 
optimized during training, when the error function (signal-background similarity) 
is minimized. The NEAT networks are also used to optimize the binning for the 
measurement. Figure [6] shows the final output distributions for all analysis channels 
combined. 
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Fig. 6. Output distributions from (a) D0's BNN combination discriminant, for all analysis 
channels combined, and (b) CDF's NEAT combination discriminant, for all lepton+jets analysis 
channels combined. 



12. Cross Section Measurements 

12.1. Bayesian Binned Likelihoods 

The distributions from the combination discriminants from 24 independent 
lepton+jets analysis channels at D0 and eight lepton+jets plus three .^T+jets 
channels at CDF are used in a Bayesian binned likelihood calculation to extract the 
single top quark cross section. A flat nonnegative prior is used for the signal cross 
section. All systematic uncertainties on background normalization and shape and 
signal acceptance and their correlations are taken into account. The shape uncer- 
tainties from the jet energy scale are smoothed from bin to neighboring bin during 
the calculation. Using the full range of the discriminant outputs for this calculation 
means that the high statistics background-dominated region (near zero) is used to 
constrain the uncertainties on the much smaller background in the expected-signal- 
dominated region (near one). The signal cross section central value is taken from 
the position of the peak of the posterior density distribution, and the uncertainty 
on the cross section (statistical and systematic components combined) comes from 
the width of the distribution about the peak that encompasses 68% of its area 
(±1(t). The cross section calculations are also performed using the outputs from 
each discriminant method separately, and using subsets of the data (all electron+jets 
channels, all 2-jets channels, all 1-tag channels, and so on) to check for consistency, 
which is found within the statistical uncertainty on the measurements. 

12.2. Ensembles and Linearity Studies 

To check that the discriminants do not introduce a bias into the measured cross 
section, D0 generates eight ensembles of pseudodatasets and runs them through 
the entire analysis chain. Each ensemble contains about 7,000 sets of events, where 
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the sets are constructed to each reproduce D0's 2.3 fb~^ real dataset. Signal and 
background events are sampled from the MC event sets after all event selection cuts 
such that the numbers of each background component match the measured yields, 
smeared by Poisson statistics. All systematic uncertainties and their correlations 
between background and signal subcomponents are included in the calculations. 
The single top quark signal cross section is set at a different value spanning the 
range from 2 pb to 10 pb for each ensemble. For the three discrimination methods 
and for their combination, a plot is produced with the measured signal cross section 
versus the input signal cross section, and a fit made to the eight points. For all 
cases, the slope of the fitted relation is close to one and the intercept is close to 
the origin, which shows that the measured cross section, if it lies in this range, 
accurately represents the signal cross section in the data. 

12.3. Single Top Quark Production Cross Sections 

The measured single top quark cross sections are shown in Table |4l The expected 
and measured significances of each measurement are also shown; these are explained 
in the next section. 



Table 4. Single top quark cross sections and significances from each analysis. 



Analysis 


Single Top 

Cross Section 


Uncertainty 

[%] 


Significance 
Expected Measured 


D0 Boosted Decision Trees 


3.741°;?^ pb 






4.3(7 


4.6(7 


Bayesian Neural Networks 


4.70tl:ll pb 






4.10- 


5.4(7 


Matrix Elements 


4.30i;;fo pb 






4.10- 


4.9(7 


Combination (170 GeV) 


3.94 ±0.88 pb 


±22% 


4.5(T 


5.0(7 


CDF Boosted Decision Trees 


2-itE;:6 pb 






5.2(7 


3.5(7 


Neural Networks 


i.8t°:l pb 






5.2o- 


3.5(7 


Matrix Elements 


2.5tU pb 






4.9(7 


4.3(7 


Likelihoods 


i.elg ? pb 






4.0(7 


2.4(7 


Likelihoods, s-channel 


1.5trs pb 






1.1(7 


2.0(7 


Combination, lepton+jets 


2.11^5 pb 


+29%, - 


-24% 






Neural Networks, ^y+jets 


4:.9+l l pb 


+52%, - 


-46% 


1.4(7 


2.1(7 


Combination (175 GeV) 


2.3l°;« pb 


+26%, - 


-22% 


> 5.9(7 


5.0(7 


Combination (170 GeV) 


2.35t°;|^ pb 


+24%, - 


-21% 






Tevatron Combination (170 GeV) 


2.76i°;5? pb 


+21%, - 


-17% 






Theory (Mtop = 170 GeV) 


3.46 ±0.18 pb 


±5% 







After the two collaborations submitted their independent measurer nents for 
publication, they worked together to combine them into one Tevatron result.^l^The 
systematic uncertainty terms are classified to map between the two measurements 
so that correlations of some terms between the two measurements are properly taken 
into account. The combination is calculated using the Bayesian binned likelihood 
calculation on all input distributions simultaneously. The Tevatron combined result 
is also shown in Table S) 
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13. Measurement Significance 

The measured significance is defined from tlie p-value, wliicli is tlie probability 
that the background fluctuated up to give a cross section measurement at least 
as large as the measured value. The expected significance comes from the p-value 
which is the probability that the background fluctuated up to give a cross section 
at least as large as the standard model theory value. The p-values are converted 
to significances in standard deviations (a) assuming a Gaussian distribution. D0 
measures these p-values using an ensemble of about 70 million pseudodatasets, 
each consisting of only background events with no signal events, by determining the 
fraction of pseudodatasets with a high enough cross section. CDF calculates the p- 
values by finding when the quantity — 2 ln(Prob(data|S' + i3)/Prob(data|i?)) is less 
in pseudodatasets than in real data. CDF's pseudodatasets are generated differently 
to D0's. Instead of sampling MC and multijet background events to generate each 
pseudodataset, they sample from the histograms of each distribution to perform the 
significance calculation. D0 finds a measured p-value of 2.5 x 10^'' and CDF finds 
a measured p-value of 3.1 x 10~^. The associated significances are shown in TableH) 
Both experiments have a measured significance of 5. Oct, meeting the standard to 
claim "observation." 

14. Measuring tiie CKM Matrix Element \Vtb\ 

The Cabibbo-Kobayashi-Maskawa matrix describes the mixing between quarks 
to get from the strong interaction eigenstates to the weak-interaction ones. The 
term relating top quarks to bottom quarks is known as Vtb- The single top quark 
production cross section is proportional to \Vtb\'^ and can thus be used to measure 
the amplitude of Vtb- To make this measurement, the collaborations assume the 
standard model for top quark decay (i.e., mostly to Wb and not much to Wd or Ws) 
and that the Wtb coupling is left-handed and CP-conserving. They do not, however, 
assume that there are exactly three quark generations for this measurement. That 
is, they do not assume CKM matri x uni tarity, unlike measurements of |Vt6| made 
using top quark decays in tt pairs.^^ The measurements include uncertainties 
from the tb+tqb theory cross section as well as those included in the cross section 
measurement. The theory cross section uncertainty from the top quark mass uncer- 
tainty is 4.2%, with 3.0% from the PDFs, 2.4% from the factorization scale, and 
0.5% from the strong coupling constant as- Two measurements of \Vtb\ are made: 
the first does not constrain the strength of the left-handed scalar coupling constant 
fi (where a nonnegative prior is used, with no upper bound), and the second sets 
fi — I (when the prior is bounded between zero and one). The results of this 
measurement from D0, CDF, and the Tevatron combination, are shown in Table [5l 

15. Separate s-Channel and t-Ciiannel Measurements 

Both collaborations have made measurements of the s-channel tb and t-channel tqb 
single top quark cross sections separately. D0 retrains the three sets of discriminants 
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Table 5. Measurements of the CKM matrix element \Vtb\- 



Experiment \Vtt\f(' \Vtb\ (ft = 1) 

and Theory 95% CL 

D0 Mtop = 170 GeV 1.07 ± 0.12 0.78 < \Vtb\ < 1 
Kidonakis 2006 

CDF Mtop = 175 GeV 0.71 < \Vtb\ < 1 
Harris et al. 2002 

Tevatron Combination 

A/top = 170 GeV 0.77 < \Vtb\ < 1 
Kidonakis 2006 



with just t-channel single top as the signal, instead of tb+tqb as in the observation 
1 1 

analysis.-'-'" CDF uses measurements obtained during their main analysis. Neither 
of these measurements assume the SM ratio for the s-channel and t-channel cross 
sections, unlike in the observation analysis. The results are shown in Fig. [T] D0's 
t-channel cross section measurement has a significance of 4.8(7. 



D0 2.3 fb' CDF 3.2 fb ' 




s-channel cross section [pb] s-Channel Cross Section [pb] 



Fig. 7. Plots showing the separate s-channel tb and t-channel tqb cross section measurements from 
(a) D0 and (b) CDF, together with theory values and some beyond-the-SM model predictions. 
D0's measurements are for Mtop = 170 GeV and CDF's for 175 GeV. The theory cross sections 
shown are Kidonakis 2006 for D0, Harris et al 2002 for CDF ("NLO") and Kidonakis 2006 for 
CDF ("NNNLO"). 



16. Summary 

The D0 and CDF collaborations have searched large Tevatron datasets and 
observed single top quark production for the first time, with 5cr significance for 
each of the measurements. The measured cross sections are consistent with NLO 
theory predictions. The analyses have been improved in many ways to achieve this, 
in particular over the years of the search: 
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D0's Innovations 

• Next-to-leading order simulation of signals with full spin information included 

• Very loose kinematic cuts and use of all possible triggers to select more signal-like 
data and increase signal acceptance 

• Multivariate techniques: BDTs, BNNs, and MEs used to improve signal- 
background separation 

• Very large number of variables, including many not used at the Tevatron before 
such as jet widths, to separate signal from background 

• Rebinning the discriminant outputs to ensure no bin has data or expected signal 
and no background events, which stabilizes the cross section measurement 

CDF's Innovations 

• Including data with no identified lepton to extend the signal acceptance 

• Jet flavor separator variable to discriminate b jets from mistagged charm, light 
quark, and gluon ones after b tagging 

• Combining different measurements using the NEAT algorithm optimized to 
maximize the expected signal significance 

• Binned likelihood fit to a discriminating variable shape to improve the 
measurement sensitivity 

As the reader can see from these lists and the previous analysis descriptions, each 
collaboration has learned from the innovations of the other one, and the outcome 
is a deep understanding of the signals and multicomponent backgrounds in many 
analysis channels, with powerful new analysis techniques developed to extract a 
small signal from a very large background. Many of these techniques are now being 
applied to the search for the Higgs boson at the Tevatron, and the single top datasets 
are providing a unique plac e in which to test various aspects of the standard model 
and search for new physics P^^^^^l] 
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